Passive Realtime Datacenter Fault Detection and Localization

نویسندگان

  • Arjun Roy
  • Hongyi Zeng
  • Jasmeet Bagga
  • Alex C. Snoeren
چکیده

Datacenters are characterized by their large scale,stringent reliability requirements, and significant appli-cation diversity. However, the realities of employinghardware with small but non-zero failure rates mean thatdatacenters are subject to significant numbers of failures,impacting the performance of the services that rely onthem. To make matters worse, these failures are not al-ways obvious; network switches and links can fail par-tially, dropping or delaying various subsets of packetswithout necessarily delivering a clear signal that they arefaulty. Thus, traditional fault detection techniques in-volving end-host or router-based statistics can fall shortin their ability to identify these errors. We describe how to expedite the process of detectingand localizing partial datacenter faults using an end-hostmethod generalizable to most datacenter applications. Inparticular, we correlate transport-layer flow metrics and network-I/O system call delay at end hosts with the paththat traffic takes through the datacenter and apply statis-tical analysis techniques to identify outliers and localizethe faulty link and/or switch(es). We evaluate our ap-proach in a production Facebook front-end datacenter.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and Robust Fault Detection of Industrial Gas Turbine Prototype Using LLNF Model

In this study, detection and identification of common faults in industrial gas turbines is investigated. We propose a model-based robust fault detection(FD) method based on multiple models. For residual generation a bank of Local Linear Neuro-Fuzzy (LLNF) models is used. Moreover, in fault detection step, a passive approach based on adaptive threshold is employed. To achieve this purpose, the a...

متن کامل

Fault Identification using end-to-end data by imperialist competitive algorithm

Faults in computer networks may result in millions of dollars in cost. Faults in a network need to be localized and repaired to keep the health of the network. Fault management systems are used to keep today’s complex networks running without significant cost, either by using active techniques or passive techniques. In this paper, we propose a novel approach based on imperialist competitive alg...

متن کامل

Fault Identification using end-to-end data by imperialist competitive algorithm

Faults in computer networks may result in millions of dollars in cost. Faults in a network need to be localized and repaired to keep the health of the network. Fault management systems are used to keep today’s complex networks running without significant cost, either by using active techniques or passive techniques. In this paper, we propose a novel approach based on imperialist competitive alg...

متن کامل

A Fault Diagnosis Algorithm of Analog Circuits Based on Node-voltage Relation

A new method of diagnosis of single faults of passive elements in analog electronic circuits, based on the node-voltage relation approach, is presented. This method consists of two parts: creation of a fault dictionary describing the nominal state of the tested circuit and containing indirect parameters representing respective faults, and a new fault detection and localization algorithm.

متن کامل

Intrusion Detection System to Detect Wormhole Using Fault Localization Techniques

In this paper, we present a strategy to detect an intrusion using fault localization tools. We propose an intrusion detection system to detect a self-contained in-band wormhole attack using a combination of active probing and passive monitoring tools. We exploit anomaly in the end-to-end delay and per-hop delay patterns to identify the nodes involved in a wormhole attack. We present an architec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017